Confidence Sets for Split Points in Decision Trees

نویسندگان

  • Ian W. McKeague
  • I. W. MCKEAGUE
چکیده

We investigate the problem of finding confidence sets for split points in decision trees (CART). Our main results establish the asymptotic distribution of the least squares estimators and some associated residual sum of squares statistics in a binary decision tree approximation to a smooth regression curve. Cube-root asymptotics with nonnormal limit distributions are involved. We study various confidence sets for the split point, one calibrated using the subsampling bootstrap, and others calibrated using plug-in estimates of some nuisance parameters. The performance of the confidence sets is assessed in a simulation study. A motivation for developing such confidence sets comes from the problem of phosphorus pollution in the Everglades. Ecologists have suggested that split points provide a phosphorus threshold at which biological imbalance occurs, and the lower endpoint of the confidence set may be interpreted as a level that is protective of the ecosystem. This is illustrated using data from a Duke University Wetlands Center phosphorus dosing study in the Everglades.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

School of IT Technical Report DECISION TREES FOR IMBALANCED DATA SETS

We propose a new variant of decision tree for imbalanced classification. Decision trees use a greedy approach based on information gain to select the attribute to split. We express information again in terms of confidence and show that like confidence, information gain is biased towards the majority class. We overcome the bias of information gain by embedding a new measure, the ratio of confide...

متن کامل

Using Pairs of Data-Points to Define Splits for Decision Trees

Conventional binary classification trees such as CART either split the data using axis-aligned hyperplanes or they perform a computationally expensive search in the continuous space of hyperplanes with unrestricted orientations. We show that the limitations of the former can be overcome without resorting to the latter. For every pair of training data-points, there is one hyperplane that is orth...

متن کامل

Using Turning Point Detection to Obtain Better Regression Trees

The issue of detecting optimal split points for linear regression trees is examined. A novel approach called Turning Point Regression Tree Induction (TPRTI) is proposed which uses turning points to identify the best split points. When this approach is used, first, a general trend is derived from the original dataset by dividing the dataset into subsets using a sliding window approach and a cent...

متن کامل

Statistical Preprocessing for Decision Tree Induction

Some apparently simple numeric data sets cause signiicant problems for existing decision tree induction algorithms, in that no method is able to nd a small, accurate tree, even though one exists. One source of this diiculty is the goodness measures used to decide whether a particular node represents a good way to split the data. This paper points out that the commonly-used goodness measures are...

متن کامل

Using Pairs of Data - Points to De neSplits for Decision

Conventional binary classiication trees such as CART either split the data using axis-aligned hyperplanes or they perform a compu-tationally expensive search in the continuous space of hyperplanes with unrestricted orientations. We show that the limitations of the former can be overcome without resorting to the latter. For every pair of training data-points, there is one hyperplane that is orth...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006